Coupling between denaturation and chain conformations in DNA: stretching, 

bending, torsion and finite size effects 
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We develop further a statistical model coupling denaturation and chain conformations in DNA 
(Palmeri J, Manghi M and Destainville N 2007 Phys. Rev. Lett. 99 088103). Our Discrete Helical 
Wormlike Chain model takes explicitly into account the three elastic degrees of freedom, namely 
stretching, bending and torsion of the polymer. By integrating out these external variables, the 
conformational entropy contributes to bubble nucleation (opening of base- pairs), which sheds light 
on the DNA melting mechanism. Because the values of monomer length, bending and torsional 
moduli differ significantly in dsDNA and ssDNA, these effects are important. Moreover, we explore 
in this context the role of an additional loop entropy and analyze finite-size effects in an experimental 
context where polydA-polydT is clamped by two G-C strands, as well as for free polymers. 

PACS numbers: 87.10.+e, 87.15.Ya, 82.39. Pj 



I. INTRODUCTION 



The study of DNA physical properties is seeing intense activity from both a theoretical [1, 2, 3, 4, 5, 6, 7, 8 9 10 
HU El El El HU QU El [18] and an experimental perspective [H3[IH[2l][l2l[23l[l4l^ first theoretical 

and experimental studies were published several decades ago, but the recent development of experimental techniques 
enabling one to address DNA properties at the single molecule level has brought a significant renewal of interest in 
the field. They provide not only average properties like their former bulk counterparts, but also the statistics of 
I fluctuations around the average values. Single molecule setups range from magnetic and optical tweezers [55J [55] 
or Tethered Particle Motion apparatus [3H1 1311 1321 133], to Atomic Force Microscopy [HJ 131] • They give access to 
huge amounts of data concerning DNA physical properties such as bending, stretching, and twisting elasticities or 
conformational dynamics [30l [3TJ [35j |36l |32] • In parallel, the genomic revolution leads to the elucidation of numbers 
of biological functions involving nucleic acids. A pressing demand follows for reliable and precise physical models, 
able to validate the many hypothesis emerging from molecular biology or microscopy experiments. This constitutes a 
double motivation for theoreticians to refine the existing microscopic DNA models: accounting for the new, accurate 
physics experiments; and validating (or invalidating) the physical assumptions underlying the proposed biological 
mechanisms. 



> 
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^-j- Denaturation is one of the intimate DNA physical features that are supposed to be involved in many critical 

cellular functions, such as transcription, replication, protein binding, but are not fully understood. Even though DNA 
unwinding at the cellular level is generally an active process due to enzymes consuming energy, such as helicases |38j , 
understanding the subtle statistical mechanics of this bio-polymer is an essential first step towards the elucidation of 
qq more complex, active mechanisms. Furthermore, the spontaneous opening of base-pairs due to thermal activation is 
likely to play a direct role in several biological processes. Recently, Yan and Marko [12] have for example proposed 
that coupling the DNA elasticity to a minimal model of base-pair melting can account for the increased cyclization 
probability observed by Cloutier and Widom [39 : even if it is rare, local denaturation increases short-range flexibility 
because single strand DNA (ssDNA) is nearly two orders of magnitudes more flexible than double strand DNA 
(dsDNA). This increased flexibility should play a role everywhere the polymer must be bent or looped on length 
scales shorter than its persistence length (typically equal to 50 nm). In the nucleosome, it is twisted around histones, 
the diameter of which is about 11 nm |40j . 

In order to get more insight into this coupling between denaturation and elasticity, we recently proposed a more 
refined coupled, non-linear model, where the internal states of base pairs (open or closed) are described by a one- 
dimensional Ising model, whereas the chain configurations are encoded by a one-dimensional Heiscnberg one taking 
into account DNA bending [XT] [TH] . By solving exactly this model, we demonstrated that taking into account this 
coupling between internal and external degrees of freedom enables the prediction of the modifications of elastic 
properties when increasing the temperature: Ising parameters are renormalizcd by temperature in such a way that 
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DNA denaturation is accompanied by a collapse of the chain persistence length. Following this route, we were able 
for the first time to write the melting temperature T m as a function of microscopic parameters only - when it was a 
fit parameter in previous models -, and to give a new description of boundary and finite size effects. 

However, our model was minimal in that sense that only bending was taken into account. Torsion is also known to 
play a role on elasticity because a strong flexion of an elastic rod is in general accompanied by a torsion |41j which 
decreases the energy cost of the deformation. Similarly, stretching of base pairs ought to be included in a complete 
elastic model. In the present paper, we systematically explore these effects into detail, by proposing an exactly 
solvable Discrete Helical Wormlike Chain model (DHWC), and predicting how Ising parameters are renormalized in 
this context (Section |n| . 

In Section |III[ we investigate the influence of the chain length (or finite-size effects) on melting profiles. At the 
experimental level, it has been shown in [TJ [55] that they are measurable even for DNA made of several thousand 
base-pairs. These effects are usually measured for polydA-polydT flanked by more stable G-C rich strands. Hence 
we modify our model to account for such clamped boundary conditions. In other models of denaturation [21 l4"2l |4"3"] . 
chain configurations are partially incorporated via a so-called "loop entropy" that takes into account the entropic cost 
of closing a denaturation bubble when it is not located at a polymer end. We investigate the role of loop entropy in 
finite clamped and free DNA chains. 



II. COUPLING BETWEEN INTERNAL AND EXTERNAL DNA'S DEGREES OF FREEDOM 

In Refs. [TTJ, [JB], we showed that the denaturation melting temperature emerges naturally by taking into account 
the difference in bending rigidities of ssDNA sequences (bubbles) and dsDNA ones. Indeed, the ratio of both moduli, 
Kds/ K ss is on the order of 50. It is at the origin of an entropic barrier which stems for the fact that in the ssDNA 
state, the allowed spatial configurations for unit tangent vectors tj, which describe the chain conformations, are much 
more numerous, then leading to a significant increase in entropy. More precisely, it has been shown that the free 
energy (mostly of entropic nature) coming out by integrating the Hamiltonian part which depends on the external 
variables tj renormalizes the bare Ising parameters, K and J, which are the energy costs of creating a domain wall 
and destacking two adjacent base-pairs respectively. The third Ising parameter, n, which corresponds to the energy 
required to break a base-pair (or "magnetic field" in a magnetism analogy), is not renormalized. In particular, the 
full penalty of breaking one base-pair located in DNA's interior, L = \i + K, becomes 

L a = n + K 

~ fi + K 

where k B T is the thermal energy and Gq(x) — x — In ( Sln ^ x ) ■ The approximation is valid in the temperature range 
of interest since k ss ~ 6 k B T. 

In the infinitely long chain limit, the melting temperature T m , defined as the temperature at which half of the 
base-pairs are broken, is simply given by Lo(T m ) = 0. The melting temperature thus naturally emerges in this model 
and is determined by the competition between the enthalpic cost of breaking base pairs (mostly Hydrogen bonds 
and 7r-overlap of carbon ring wave-functions of adjacent nucleotides but also charge, dipolar, and Van-der-Waals 
interactions) and the entropic gain in nucleating bubbles made of very flexible single-stranded DNA chains. 

However, other external variables than tj, which also characterize the chain elasticity, may lead to a renormalization 
of the parameter L. Clearly, two other external degrees of freedom should also be taken into account: 

• many force-extension experiments have shown that the monomer size a is no the same in dsDNA and ssDNA 
(see the review [UJ and references therein). Indeed, the monomer size in the B-form of double-stranded DNA is 
generally defined as the rise along the central axis per base-pair which is ads — 0.34 nm. The generally accepted 
value [THUS] of the monomer size in ssDNA is a ss = 0.71 nm and we choose in the following a ss ss 2ad s |5S] . 

• the B-form of dsDNA is the famous double helix and a torsional energy has to be taken into account in a more 
refined model. Indeed, in the continuous Helical Wormlike chain model for DNA jJS], the elastic energy of the 
chain has two contributions: a bending term already taken into account in |171 118] and an energy of torsional 
deformations which in the continuum limit reads 

£ twist = y f nj(s)d s (2) 
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where SI3 = ■ 63. The Darboux vector fi characterizes the rotation of the material frame, 63 is along the 
molecular axis, and s is the curvilinear index. The twist (or torsional) rigidity modulus C has been measured 
in torsional experiments on dsDNA [37J HH1 [SOJ 151]) and is on the order of Cd s — 2.4 — 4.5 • 1CP 19 J.nm. 
The twist rigidity of ssDNA is lower because it loses its stiff helical structure and has been evaluated to be 
C ss ~ 9 • 10~ 20 J.nm |52j . The ratio Cd s /C ss is on the same order of Kds/ftss and will certainly modify the Ising 
parameters in a similar way as for the bending energy. 



A. Discrete Helical Wormlike Chain model 



In the present work, the DNA is modeled as a fluctuating polymer chain in a space of 3 dimensions, characterized 
by the external chain variables, the set of N bond vectors tj, and their orientation in space (it is thus implicitly 
assumed that the monomer has a three-dimensional structure); and an internal Ising variable cr, = ±1 which models 
the internal state of dsDNA, unbroken (U) or broken (B) respectively. The modeling of the base-pair internal state 
by an Ising model has been developed in the 60's by Lehman, Montroll and Vedenov (see review [53] and references 
therein) . 

We focus on the coupling of the internal variables with the external variables which is included in the Hamiltonian 
part treating the fluctuating chain. A material coordinate frame is defined for each monomer i, {e^j^i^.3 = 
{u.j, fij, ti}, where t, is the unit bond vector = Rj+i — R; = tjti and the two other unit vectors are in the directions 
of the principal axes of inertia. This triad is defined with respect to a fixed referential {x,y,z} through a rotation 
matrix A, characterized by Euler angles uji = (ccj, Pi, 7$). The evolution of the triad along the molecular chain from 
monomer i to monomer i + 1 is obtained by a rotation also defined by Euler angles (0^+1, i/'m+i) 

&/i,i+l = A./j,v{<l>i,i+l-i0i,i+li'4 > i,i+l)&u,i (3) 

where the rotation matrix A is the product of three rotation matrices associated with each Euler angle, but can also 
be viewed as the product of two rotations of angles Oi^+i and 1 + Ej 

A.(0j ) i+i,0i,i+i,V'»,i+O = R(^i>i } i,i+l)R{^i,i+l^hi+l)Rv'ii^ ) i,i+l) W 

In the material coordinate frame {e^}, the bond vector t^+i is thus defined by its spherical coordinates (0^+1, 4>i.i+i)- 
Moreover, the Euler angles (0^+1, ipi,i+i) which will appear in the Hamiltonian are completely determined by 

the two sets of Euler angles u>i and u>i+i through A i i+ i = A i+1 • A" 1 . 

The configurational part of the Hamiltonian is defined as the sum of two terms 

H[a,t,lj>]= 7Yl s i n g W\ + Wchain [c, t , 1p] (5) 

where Wising [c] is the usual Ising Hamiltonian already defined in |17l 118] with three parameters (n,J,K), and 
Wchain [f, t , tp] is the Discrete Helical Wormlike Chain (DHWC) Hamiltonian 



N N-l r 



Wising [fj /' / "/ / 



i=l »=1 
N , N-l 



K , s 
J<j i+ i<Ji + — (cr,:+i + <Ji) 



(6) 



WchainM,^] = 1^2^ (\U\ 2 -a 2 ,) 2 + ^-^2[K i , l+1 (i l+1 -i l ) 2 



2 ^ 2 Vl 11 l/ ' 2 

i=l i=l 

+2C M+ i(cos0 i)i+1 -cosAj^+i)] (7) 

The first term of ^ is a non-linear stretching term dictated by rotational and translational invariances. The values 
of the Lame coefficient Ej and the monomer length depend on the state of the base-pair [(ej/,aj/) for <7j = +1 
and (es,as) for <jj = — 1]. The second term corresponds to the bending and torsional energies. The latter can be 
written as Cj [trA(0, ^j+i, 0) — trA^j^+i, 0j,i+i, V'i.i+i)], and accounts for the energy penalty associated with the 
twist defined by the angle 4>i,i+i + V'M+i- Indeed, the angle A of the rotation defined in |3]) is a function of (f> + ip and 
9 (indices i,i + l are omitted): 

cos A = i[cos(^+ ip)(cos9 + 1) +cos6» - 1] (8) 

The bending k^+i and torsional C^+i moduli also vary locally with the state of nearest-neighbour links [(kui Cu) 
for type U — U, (kb, Cb) for B — B, and (kub, Cub) for U — B\. We assume in this model that all the parameters 
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appearing in ^ are independent of the nucleotide type. Hence we focus on homopolynucleotides. The case of sequence 
dependent parameters could be handled numerically. 

Equation ([T]) defines our discrete version of the continuous Helical Wormlike Chain model first employed by Ya- 
makawa for DNA [48] and extended in several articles in the literature [5TJ [SH [55J [SB]. First, one observes that if 
there is no twist, i.e. no rotation around the tangent vector tj, it imposes 4> + ip = and from ( t|) — , the tor- 
sional term vanishes. Hence if there is no twist along the chain (or if the DNA chain is modeled as linear), the 
DHWC becomes the classical Discrete Wormlike Chain already developed in 17, 18J. Furthermore, the Discrete 
Helical Wormlike Chain simplifies in the continuum limit, Xj+i — x\ — > As with As — ► where s is the curvilinear 

index. Indeed it is straightforward to see that J^j—T 1 «(tj+i — ij) 2 — ► J «[fif(s) + f2 2 (s)]ds and with more alge- 
bra that X^i* C[trA(0, 0^+1,0) — trA(<^i ) j+i, 6i,i+i, V>i,i+i)] simplifies into |2| where the Darboux vector is defined 
by e^j+i — e^i — > f2 x e Mj i and O^(s) = r2 • e M (s). Finally, in the low temperature regime where the spin-wave 
approximation is valid (0 + ip <C 1 and 6 <C 1), bending and torsional contributions reduce to quadratic terms 



i=l 

The discrete model defined by ^ has already been used in the context of DNA supercoiling [57] . 



B. Stretching contribution to the entropy of bubble nucleation 

The first stretching term in ^ is local without any coupling between the nearest neighbours. Therefore it can be 
integrated out easily. The Lame elastic constant e is very large for DNA molecules: ea 3 as been evaluated as 8.4 nN 
for ssDNA by fitting force-extension experimental curves using ab-initio calculations [45] [58] , and one can expect the 
same order of magnitude for dsDNA. Therefore, ea 3 ^> ksT/a ~ 4 pN and the saddle point approximation applied 
below is valid. 

By expanding the first term of ^ and writing |tj| = a, + Si we have 

(\U\ 2 - a\f = (|t«| + a,) a (M - ai f « 2a 2 (\U\ - ai f + O(Sf) (10) 
The elastic term of the Hamlitonian ^ simplifies into 

AT-l 2 

H ch ain[o-,t,V] - E "V^CN ~ °*) 2 + ^M+it 1 - cos6> M+ i) + C M+ i (cos 6> M+1 - cosA M+1 ) (11) 
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The configurational part of the partition function is 

/JV-i 

I TT 

87r 2 a 3 



Z = E e-^— W / f J] J e-^*-*^*-*] (12) 



where 7, is the second twist Euler angle of t j with respect to the reference frame and do is a normalization length. By 
using the decomposition of tj in spherical coordinates, (t,,a!j, ft), one has d 3 tid7i = t 2 dti sin aida;id/3id7i = di^d 3 ^ 
and the partial partition function for the chain is 



--chain [fj 



n l t e = (ts - Qi 5 / n ( ^ ) < i3 > 



where 7i ang ie [cr, is the bending and torsional Hamiltonian. Using the saddle point approximation for the stretching 
integral, we get in the large stretching constant limit 

poo i9 11 2 ^ /~o — 

TT / ^e-^C— ) 2 «TTa/— ^EEe-^inA, (14) 
•Li Jo ao f = i V fa ao 

As explained above, we assume that the stretching energy has two competitive minima for dsDNA and ssDNA. In our 
model it means that the elastic constant and the monomer size a, have two different values whether the monomer 
is in the unbroken (er, = 1) or broken state (<7j = —1). Hence, once integrated over the local ij variables, the 
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stretching energy part can be included in the Ising part of the Hamiltonian to get an effective fsing Hamiltonian with 
a renormalized /i. Indeed, by defining In Aj = Sfiai + T where S/i — In ( and T = ln(AfyAs), the renormalized 
temperature dependent chemical potential is 

fi = fJ- - k B T\n [ — ./— ) (15) 
\au V e B J 

where the correction accounts for the entropic gain when the monomer state changes. It has two contributions: 

1. in the broken state, the monomer size is greater, a B ~ 2a^, which implies a larger volume in the phase space 
and thus an increase in entropy; 

2. in the case of different elastic constants, ejj ^ e B , since the stretching energy (E) = \k B T is independent of 
these constants, the elastic free energy difference is purely of entropic origin, similarly to the simple Einstein 
model for solids. 

In the present case, the elastic constants tjj and e B are unknown. Although several experimental studies seem to 
show that the stretching constant of dsDNA is larger than for ssDNA ,59, 60J, we have not been able to find reliable 
values. If, for example, we assume them equal, then the chemical potential /i is lowered by 0.5 — lk B T, which is 
non- negligible. 



C. Bending and torsional contributions 



In this section, we focus on the partition function integrated over the angles (d 3 u>i = sin ctidctidflidji) . The full 
partition function (12 1 can be written as 



/3Wi a i ng ,o M 



i(l-c 



M+i)+Ci,i+i(c 



+ COS Ai -i + i) 



(16) 



where 7ii s ing,o is the same as ((6| with fj, replaced by [Iq given in (15). Similarly to the Discrete Wormlike Chain 
model |17[ 118]. the partition for the coupled system can be calculated using transfer matrix techniques. For example, 
we have 



N 



dh 



8vr : 



: (^|(Ti)(o-i|P(wi,a;2)|cr2) ■ • • (trjv_i|P(wjv-i,wjv)|o-jv)(o-jv|V), 



(17) 



where the matrix elements of the transfer kernel that appears N — 1 times in (17), are given by (the tilde means in 
units of ksT) 



(+l|P(u>,-,UJj+i)| + 1) = e K c/(cos6» i , i + i-l)+C c/ (cose i , i+1 -cos Xi, i+ i) + J+K+p, 
/_ 1\p( lt f i o; i+1 )| — 1) = e k ui c ° s 9i,i + i- 1 )+C : u(c°s6i,i + i-cos\ i ^ +1 ) + J~K--jl 

(+l|P(u> i ,w i+ i)|-l) 



k UB (cos 6 iti+1 -l)+C UB (cos 9 i>i+1 — cos X iti+ i) — J 



= (-l|P( Wi ,a;i +1 )| + l) 

It is written in the canonical base \U) = \ + 1) and \B) = | — 1) of the U and B states. The end vector 

\V) = t^l^U) + e-M*\B) 



(18) 
(19) 
(20) 
(21) 

(22) 



enters in order to take care of the free chain boundary conditions |18] (see also Section III I . 

The partition function can be rewritten by examining the effective Ising model obtained by integrating over the 
chain conformational degrees of freedom uii in (17 1. The problem reduces to that of an effective Ising model with 
an "effective free energy" Pi s in g ,cff containing renormalized parameters. This method works because, for the coupled 
Ising-chain syst em , the rotational symmetry is not broken. Hence the matrix obtained by integrating the kernel 
P{<jJi, i*>i+i) in (17 1 is the same for any site i. 
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We thus are able to carry out the angle integrations in sequential fashion by using the triad {e Mi i_i} as the referential 
for the i th Euler angle integration. Since this corresponds for each integration to make a rotation transformation for 
the variables with the Jacobian equal to 1, the Euler angle integrated transfer matrix is 



I.eff 



dh 



e -G{k u ,C u ) + J+K+fi e -G(k UB ,CuB)-J 
e -G(k UB ,C UB ')-J e -G(k B ,C B ) + J-K-fi a 



(23) 



where G(k, G) is (in units of k B T) the free energy of a single joint (two-link) subsystem with bending and torsional 
rigidities (k, G) (either U - U , B - B, U - B): 



G(ii,C) 



-In 
2k - In 



sin6»d6»d0d^ k 
8tt 2 



(cos 0-l)+(5(cos 0-cos A) 



dx!o(Cx) e 



(2k~C)x 



(24) 
(25) 



where Iq is the modified Bessel function of the first kind [61]. Two interesting cases are: 



• (7 = leading to G(k,0) = Go(k) already defined in ([TJ which is an increasing function of k (cf. Figure [TJ, 
and ( p5| ) is a generalization of the previous result ^ |17[ ITHj ; 



k = 0, G(0, C) = C — In Iq(G) + Ii(C) which is also an increasing function of G (cf. Figure 



0- 



The function G(k, G) is plotted in Figure [l] which shows that it is a monotonic increasing function. In the spin- wave 
approximation, the integral (24 1 is computed using the saddle-point approximation and the asymptotic behaviour of 
Gis 



G(k,G) — ► ln(2«) + -ln 2 
£,C»1 2 \ 



2 x 



C 



(26) 



We observe in Figure [l] that the asymptotic limit is a very good approximation for k and G larger than 2, and thus 
for real DNA. 

The Hamiltonian of the model ^ then reduces to an effective Hamiltonian which is now of Ising-type 



Wlsing,eff[o1 



JV N-l r 

i=l i=l 



where the bare Ising parameters K and J are renormalized according to 

k B T, 



Kn = K — 



Jn = J- 



-[G(ku,Cu) - G(k b ,C b )] 
[G(%, C v ) + G{k B ,C B ) - 2G(k UB , C UB )} 



(27) 

(28) 
(29) 



and /Iq is defined in ([151. 

Usually, it is admitted that the torsional modulus is proportional to the bending modulus G ~ 1.6 k |51j . Taking the 
same values as in jTTl [T5] for a polydA-polydT homopolymer, k\j — kjj B — 147 and k B = 5.54 at T = T m = 326 K, 
we get G(ku, Cu) = 9.3 and G(k B ,C B ) — 4.3 which leads to a decrease of K and J by about 2 — 3 fcgT and 1 — 2 k B T 
respectively in the temperature range of interest. We have found in [TH] \i= 1.78 k B T, J — 3.64 k B T and K was set 
to 0. Hence these entropic contributions are on the same order of magnitude as the bare values and must be taken 
into account. 



Moreover, with these values, the spin- wave approximation applies and we can summarize (15) and (28 1 as 

' ageuKu^/Cu' 



La = Mo + K w + K - 



k B T 



In 



afr e B K B VG~t 



(30) 



showing that the renormalization of the Ising parameters comes essentially from entropic effects, namely stretching, 
bending and torsional entropies. 
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This twist-induced melting might be important in the context of single molecule torque experiments [28] [35j [36j 
[37j [62] and in the context of superhelical stressed circular dsDNA [63 . For instance, within this model, applying a 
torque (or a twist) will locally modify the free energy cost L to nucleate a bubble and will, in return, influence the 
mechanical response of the chain. 

In the rest of the paper, we will be interested in expectation values depending only on the spin variables er^. Hence, 
everything can be computed using directly the effective ising Hamiltonian (27 1 with renormalized parameters, /io ; Lq 
and Jo- I n principle, the DHWC model could be completely sol ved by transfer matrix techniques, thus requiring the 
diagonalization of the transfer operator P(uJi,u> i+1 ) defined in (17 1. This out of the scope of the present work. 



D. End-to-end distance 



In this section, we compute the end-to-end distance of a dsDNA using the model presented in [T7] where we neglect 
the torsional term. We show that the difference in monomer sizes in the unbroken and broken states modifies the 
end-to-end distance and should be taken into account. Therefore, we complete the findings of |18j where the monomer 
sizes were supposed to be equal. 

The end-to-end distance of the chain is defined as R = VR 2 , where 

N 

R 2 = J2 <( a ^) • (°i*i)> ( 31 ) 

N 



The monomer size, which depends on the internal variable <7j, can be written as dj = Aai + B with A = (ajj — clb)/2 
and B = {au + as)/2. In the thermodynamic limit, N — > oo, this expression simplifies to 

R 2 °° 

AT n^L (^ 2 (0+2AB( ( 7 4 )+B 2 )+2^[A 2 ( ( 7 J t,-t 4+r( 7 J+r ) (32) 

r=l 

+AB{(aiti ■ t i+r ) + (U ■ t i+r a i+r )) + B 2 {U ■ t i+r )] 

which is independent of i. By using the transfer matrix approach and the results already presented in |18j . we find 
after some lengthy calculations 

~ — » A 2 + 2AB(a z )+2B 2 e cS (33) 

iV N— voo 

+2^ (A 2 (l, r\a z \0, +) 2 + 2AB(0, +|1, r)(l,r|^|0, +)) - % 
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FIG. 2: End-to-end distance (in units of base-pair size) as a function of the temperature T for the parameter values /i = 
4.46 kj/mol, J = 9.13 kj/mol, K — 0, corresponding to T m — 326.4 K and ku — Hub — 147, Rb = 5.5, as = 2au- The full 
calculation ( |33[ ) (in red) and the interpolation formula ( |37[ ) (in black) coincide. The blue and green broken lines correspond to 
the bare dsDNA and ssDNA respectively. 




where the effective persistence length is defined as 

&4£ M °.+> 9 ir^ (34) 

T 

The Pauli matrix a z acts only on the second part of the basis that diagonalizes the transfer matrix operator P: 
|^j,to,t) = \l, m ) ® l^ r ) (where (l,m) are the quantum numbers associated to the spherical harmonics and r = ± 
labels the eigenstates of the Ising model). In the basis |0, ±) we have 

a z =( A %— V^-^M. (35) 



where (c)oo is the expectation value of average spin variable (or "magnetization") in the thermodynamic limit 



JY 



LST/ \ I \ _ sinh(Lp) 

N n^oo {C) °° - [sinh 2 (L ) + e -4Jo]i/a ldbj 

The parameter Lo is defined in (|lj) and Jo in (29) setting (7 = for the three cases. The two orthonormal eigenvectors 
for a fixed I are defined in [TTl Il8] . 

The result ( 33 ) is shown in figure [2] for as = Zaij ■ An accurate interpolating formula is given by 

RLcrpoi = 2NQp ua fj^ + <PBa%&) = (1 - <Pb)B^ + (^ B R S 2 S (37) 
thus generalizing a similar result given in |18j for the case ajj = as- 



III. FINITE SIZE EFFECTS WITHIN THE DHWC MODEL 



In this section, we study the behaviour of the fraction of open base-pairs, <pb(N,T), as a function of both tem- 
perature and chain length for homogeneous DNA with free and modified boundary conditions (necessary for DNA 
inserts). Despite early recognition [64 that a careful experimental study of such homogeneous DNA polymers of 
varying length would be of great help in advancing our theoretical understanding of DNA denaturation, unfortunately 
such a study has not yet been carried out. As a consequence, important questions concerning the competition between 
end unwinding and internal bubble formation for finite chains, as well as the correct form of the loop entropy factor 
(including the effect of chain rigidity) and the role of chain disassociation, remain open. Our goal here is to shed 
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further light on the role of polymer length in the thermal denaturation homogeneous DNA (see [55] for recent study 
of finite size effects within the framework of generalizations to the Peyard-Bishop model). 

The model we use here is a generalization of the one presented in [TTJ, [T5] and has been defined in Section [IT] The 



renormalized chemical potential is given by (151 and for purposes of illustration we use the simpler, but accurate, 



spin-wave approximations for the two other renormalized parameters, summarized here: 



,2 



Lo(T) « X-^ln( y^^M (38) 

J (T) w J —In I— J - , 39 

4 V k ub c ub J 

where L = fi+K. We also adopt the following physically reasonable set of model parameters: kjj/kb — 147/5.5 = 26.7, 
hub — Kjj, ciB/au = eu/es = 2, and Cb/kb — Cu / kjj — Cub/^ub = 1-6. When loop entropy is not included in 
the model we use a value for J obtained previously by fitting experimental melting data for a homopolynuclcotidc 
polydA-polydT : J = 9.13 kJ/mol [T71[TB] (we recall that the renormalized value Jo is a key parameter in determining 
the transition width). When the effect of loop entropy on the thermal denaturation of free chains is studied, we 
will use a smaller value of J, half of the larger one, as it is well known that loop entropy tends to sharpen the 
transition [42j [53l [64] . If the model prediction (without loop entropy) for the melting temperature for polydA-polydT 
of length N = 30000 base-pairs is chosen to agree with the experimental results in [TJ (T^ xpt - = 338.70 K), then we 
obtain L = fj, + K = 9.87 kJ/mol, close to the value obtained by setting Lo(T°f pt -) — (which gives the model result 



without loop entropy for the infinite chain melting temperature, see Eq. 38 1. Using J = 9.13 kJ/mol, we find that 
J ~ 12.3 kJ/mol at T = 339 K, which implies that the entropic contribution is greater than 25 % near the melting 
temperature [J = (3 J = 3.23 for f3 = l/(fc B T™ pt )]. Using J = 4.57 kJ/mol, we find that J ~ 7.70 kJ/mol at 
T = 339 K, which gives an entropic contribution of 41 %. 

In our previous work (TTJ [18] we assumed that the difference in bare stacking energy, K, between the U and B states 
was zero. This choice was based on evidence that near room temperature single stranded polyrA remains stacked [66J. 
It seems, however, that near the dsDNA melting temperature dT single strands are probably completely and dA ones 
partially unstacked [TJ with an unstacking fraction close to 75% near T m [TJ . We can conclude that the single dT and 
dA strands in polydA-polydT bubbles may have much less stacking energy than the helical segments and incorporate 
this effect into the model by introducing a weighting parameter, /, that measures the contribution of K to L at fixed 
L: K = fL and \i — (1 — f)L. Although the two unbound single dT and dA strands in a polydA-polydT bubble may 
not behave exactly like two free single dT and dA strands, the above discussion does suggest that / may be large 
near the melting temperature. Indeed, if we accept the putative experimental value for the bare enthalpy needed to 
open one A-T base-pair as a measure of /i, then we find /i ~ 5.25 kJ/mol (TTJ, [TBI IHZ] • Using this result and the above 
value for L then yields / ~ 0.5. When / is taken to be zero there is no loss in stacking energy when a bubble opens 
and we recover the case previously studied in (T7j [18] . 

An important question is how to incorporate bubble loop entropy into statistical models of fluctuating DNA. This 
loop contribution arises from the extra cost in free energy (with respect to two single unbound end chains) needed 
to form a closed loop of bases making up a bubble 42, 64, 68, 69 . When loop entropy is neglected Poland- Scheraga 
(PS) type models reduce to effective Ising ones, albeit without the end-interior asymmetry that naturally arises within 
our approach from the difference between L and /j,q (see Eq. 20 of |18j). This can arise both from a dissimilarity 
between \i and K and from the renormalizations coming from integrating out the conformational degrees of freedom. 
If without justification we formally set no equal to L we recover previous Ising/PS type models without loop entropy. 

For finite DNA polymers, end effects may have a strong influence on both the thermal denaturation transition 
and chain conformational properties. As already discussed in [TB] the coupled DNA model that we have developed 
is extremely useful for investigating the dependence of various system properties on chain length, N. For DNA 
homopolymers two types of situations can be envisaged: (i) finite homopolymers with free end boundary conditions, 
and (ii) finite polydA-polydT inserts between more stable G-C rich domains with much higher melting temperatures. 

Case (i) has already been extensively studied theoretically in [T5] when the loop entropy associated with bubbles 
is neglected. Although for very long chains end effects are unimportant and / plays no role (only the value of L 
is important), for not too long finite chains / has a strong influence on the melting curves. Within the scope of 
our model with / = it was found previously that for finite DNA chains thermal denaturation takes place in an 
inhomogeneous fashion with the probability of base-pair opening being higher at chain ends for temperatures T < T* . 
At the temperature T* the fraction of broken base-pairs becomes independent of chain length and the probability of 
base-pair opening becomes independent of position on the chain (see Figs. 6 and 7 of [18]). For / — it was also found 
that the melting temperature obeys T* < < T m (N) [where = T m (N — > 00)] and, along with the transition 
width, decreases with increasing N. For T < T* the fraction of open base-pairs, <Pb(N), decreases with increasing N, 
whereas for T > T* , it increases with increasing N. We further this previous theoretical study here by investigating 
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the influence of the weighting factor /. 

Without loop entropy previous Ising/PS type models predict T m independent of N and therefore T m — T* . When 
loop entropy is added to these models T m (N) becomes an increasing function of N, which appears to agree with 
experiment (in [T] it was found that the melting transition for a free homopolymcr of length N = 30000 takes place at 
a temperature 1 K higher than that of chains of length N sa 500 and is much sharper). We also examine in detail the 
validity of the one-sequence approximation for free boundary conditions and investigate the influence of loop entropy 
in situations where the accuracy of this approximation can be gauged |421 164] . For free boundary conditions this 
approximation involves keeping only the base-pair states forming one interior bubble or one helix section of variable 
length < n < N. Unfortunately, there do not appear to be any detailed experimental studies of the thermal 
denaturation of DNA homopolymers with free ends as a function of chain length (see, however [53j [70] ) that can be 
used to test the model predictions and clarify the role and importance of both end effects and bubble loop entropy. 

For the case (ii) of an A-T insert of length N in larger more stable DNA polymers, detailed experiments [T] have 
already been carried out for 60 < N < 140 and also interpreted using both a simple two-state approximation for 
the A-T insert and the Poland-Scheraga model [42] (including loop entropy) for the entire polymer pQ. For inserts 
the boundary conditions are fixed mainly by the exterior G-C rich domains and only L enters (and not /, i.e., the 
individual values of [i and K) . For inserts the one-sequence approximation involves keeping only the base-pair states 
forming one bubble of variable length < n < N. The two-state approximation accounts only for the completely 
closed and the completely open chain states in the partition function |42] and is a special case of the more general 
one-sequence approximation. The validity of these types of approximations relics intimately on the relatively large 
cost in free energy for creating a bubble (or base-pair domain walls) compared with the cost of changing the length 
of an already existing bubble (i.e., \Lq\ -C Jo). The upshot is that a one-bubble state can have a variable length (and 
in dynamics undergoes breathing) and such states should dominate the free energy for not too long chains (and for 
longer chains, temperatures not too close to the melting one). 

We reexamine this problem by analyzing the same experimental results [1 using our coupled model for a finite 
chain with modified boundary conditions, because in such situations the nature of end monomers becomes extremely 
important. In doing so, we study the validity of both the two-state and one-sequence approximations without loop 
entropy by comparing the predictions of these simplified approaches to those obtained from the exact solution to 
our model. By incorporating the loop entropy into the one-sequence approximation, we also examine the role and 
importance of this effect for homopolymcr inserts. In order to compare the predictions of the model with experiments 
on A-T inserts we have fitted the DNA melting data presented in Fig. 6 and 7 of [I] using simple fitting functions, 
the goal being to get a smooth approximation to the data (see Appendix) that will be useful in this section. 



A. Exact results for General Chain Boundary Conditions (without loop entropy) 



Using transfer matrix techniques we have shown that it is possible to obtain a compact expression for the average 
fraction of open base-pairs in a finite chain of length N for arbitrary boundary conditions 18J (with neither loop 
entropy, nor chain sliding): 



Vb{N,T;V) = i[l- (c)(/V,T;A' 



where (c)(N,T;j?) = l/Nj^ti (°») is given by 
(c)(N,T;0) = (c) x 
is the Ising correlation length, and 



2Rl 



R 2 V + eW-V/b 



2i? yx /l-(c)g (l-e-^) 
N [1 + Rle-^-V/Z'] (l - e -i/fo) 



with the normalized end vector 



\V'(fi')) = [2 cosher 17 ' (>'/2|[/) +e -fi'/3\ B )\ 



(40) 



(41) 



(42) 



(43) 



enforcing the chain boundary conditions. The quantities (c)(N, T; p,'), (c)oo given in (36 1, Rv(p'), and £j are all 
functions of Lq and Jo [TS]. For free ends jl' — fro, whereas for closed (open) ends, \V) = \U) (\B)), which can be 
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FIG. 3: Fraction of broken base-pairs (40 1 vs. temperature for free boundary conditions (without loop entropy) and chain 
lengths of N = 30000, 136, 105, 83, and 67 (from left to right, above the temperature of intersection, T*). (a) / = 0, (b) 0.4, 
(c) 0.6, (d) 0.7, (e) 0.8 (other model parameters used are listed at the beginning of Section III I. 



seen by taking the jl' — > ±oo limits of (43 1. When jl' is formally set equal to Lq there is no longer any end- interior 



asymmetry and the model reduces to older Ising/PS type [53] models without loop entropy. 
A simple expression can be obtained for Ry by setting N = 1 in ( 41 1 and solving for Ry : 



Wi - c 



where (c)i = tanh(A') is a function of jl' and therefore reflects the boundary conditions. 



(44) 



1. DNA Chains with free boundary conditions 

When pf = jlo (free boundary conditions), i?y,f rco = Ry(jio) gets simplified in the following way for special values 
of TIHI: 



Rv.hc 



-e~«\ T <T* 
0, T = T* 

tanh(Ao/2), T=T- 



(45) 



which shows that -fiV^ec is a monotonically increasing function of T and vanishes at T = T* 
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In Fig. [3] we present model results (with neither loop entropy, nor chain sliding) based on Eq. (41 ) for free chains 
of different lengths and different values of /. We observe that T* increases with increasing /; for / < 0.7, T m (N) 
decreases with increasing N, whereas for / > 0.7, T m (N) increases with increasing N. When / w 0.7, the melting 
curves are nearly identical with the results obtained from older Ising/PS type models (fjf = Lq) without loop entropy. 
When loop entropy is added to the model the melting temperatures for the longer chains will be shifted to the right 
amplifying the effect of finite / (see below). 



1 
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FIG. 4: Fraction of broken base-pairs (40 1 vs. temperature for N = 136 and / = (fi = L = 9.87 kj/mol) as function of //: 
from left to right, free boundary conditions y! — /Lto; /J>'/L — 0.86; 1.14, 1.43; 2.00; 8.56; for closed boundary conditions the 
result is superimposed on the right-hand curve (fi' /L = +oo). 



2. DNA Inserts with closed boundary conditions 

For an A-T insert of length N in more stable G-C domains a simple starting approximation is to apply closed 
boundary conditions (i.e., base-pairs i = 1 and N are considered to be held closed due to their coupling to the 
adjacent G-C domains). For closed boundary conditions, p! — > oo, leading to 

flv,d - (46) 
V 1 + ( C )°o 

which is non-zero for all T > 0, implying that in this case T* = 0. 

Unfortunately in this case only TV — 2 base-pairs can open. A better approach involves artificially extending the 
insert length from N to iV+2 and using closed boundary conditions on the extended chain. In this case the "fictitious" 
[i = 1 and i = N + 2) base-pairs are held closed by the boundary conditions in order to simulate the influence of the 
adjacent more stable G-C rich domains and the remaining N base-pairs can fluctuate. Since the i = 2 and i = N + 1 
base-pairs are adjacent to closed base-pairs their probability of opening will be lower than that of interior ones. It is 
clear that in this case melting will begin near the center of the insert. If ip c g(N,T) is the fraction of open base-pairs 
for a chain of length N with closed boundary conditions, then simple counting shows that the average fraction of open 
base-pairs in the extended model is given by 

<pT(N,T) = ^±2^(jV + 2,T). (47) 

A more sophisticated approach is to keep the physical insert length of N and account for the coupling to the more 
stable G-C rich domains via a mean-field type approximation by taking (Xq < fif < oo. The approaches presented 
above are obviously valid only when the temperature is sufficiently far below the melting temperature of the G-C rich 
domains so that the experimental UV absorbance used to measure cp g (N, T) comes primarily from the A-T inserts in 
the temperature range of interest. 

In Fig. [4] we show how ips(N, T, p!) varies as a function of p! for N = 136. The melting temperature as a function 
of p! interpolates smoothly between the results for free (p' = po) and closed boundary conditions over a temperature 
range of ~ 5 K and the width of the transition increases slightly with increasing p' . 
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FIG. 5: Fraction of broken base-pairs (401 vs. temperature for (from left to right) N = 30000, 136, 105, 83, and 67: fitted 
experimental results from Fig. A. 2a (dashed curves) and (a) model predictions using free boundary conditions, (b) // optimized 
to fit T m (N), and (c) closed boundary conditions. 



In Fig.[5]we compare the experimental results for A-T inserts (Fig. A. 2a) with the model predictions for ips(T,N) 
for / = and three different model boundary conditions: (i) free boundary conditions, (ii) optimized //, (iii) extended 
model, closed boundary conditions. The value of L — 9.87 kJ/mol is held fixed to reproduce the experimental melting 
temperature for N = 30000 and the model predictions for the optimized [for T m (N)] \j! case are practically insensitive 
to changes in / and J. For closed boundary conditions if b (N) increases with increasing N at fixed T simply because 
the end effects get attenuated for long chains as illustrated in Fig. [5] We conclude that the model in its present form 
can reproduce the qualitative tendencies, but not the quantitative details, of experiments on short A-T inserts (for 
such short chains including loop entropy into the model will not lead to better fits, see below) . The results presented 
here do allow us, however, to gauge the importance of chain boundary conditions on the melting curves. One difficulty 
in applying the present model arises because the simplified approach presented here does not account for the increased 
probability of opening for G-C base-pairs adjacent to the A-T inserts. The complete solution of our model for the full 
heterogenous chain is in principal possible using known numerical methods, as is discussed in the Conclusion. 

The exact result for tpg(N,T; fjf) ( |40| ) does not reveal in a physically transparent way which states contribute the 
most for a given chain length N and temperature T and, as already mentioned, includes neither the effects of loop 
entropy, nor of chain sliding. In order to include such effects in a straightforward way we now study the one-sequence 
approximation to the exact partition function for our model, an approximation that should be valid for sufficiently 
short chains. 



B. One-sequence approximation 

1. One-sequence approximation for closed boundary conditions: DNA Inserts 

We start by examining the one-sequence approximation for homopolymer inserts of length N for closed boundary 
conditions without loop entropy. The effective free energy of creating an interior n-bubble with two base-pair domain 
walls is [18J . 

(3AG^ = 4J + 2nL (48) 
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and therefore the restricted partition function, Z^ seql including only n-bubbles varying in size between n = (helical 
insert) and N (bubble insert) is given by: 



Z lseq — 1 



N-l 

E 

m=0 



(to + 1) exp 



(49) 



where the first term equal to one comes from completely closed chain state and for an ro-bubble m = N — n is the 
number of remaining intact base-pairs in the insert. The factor of (to + 1) = iV — n + 1 in the sum is entropic in 
nature and equal to the number of ways of placing an n-bubble inside an insert of length N. We recall that Lq 
becomes neg ativ e for T > and therefore in the high temperature range of interest for inserts the term depending 



on AGjJJ: in (|49| favors large bubbles. The entropic factor, on the other hand, favors small bubbles. The one-sequence 



approximation incorporates the first two terms (of order and 1) in an expansion in powers of the loop initiation 
factor, 



-4J 



(50) 



which counts the number of bubbles 



Within the one-sequence approximation the average fraction of broken base-pairs can be obtained from Zf seq : 



1 d(lnZ? seq ) 
' 2N dL Q 



The sums in (49 1 can be carried out to find the following compact expression: 



Z? seq = l + e- iJ °C(e 2L °) 



where 



with 



C{x)=x- N {xp'{x)+p(x)) 



p(x) 



x N -l 
x - 1 



By using (52 1 the following expression can be obtained for ip c g lse „(-/V): 



-4J 



N 



oc 

Ox 



(51) 

(52) 
(53) 

(54) 
(55) 



For sufficiently short chains the one-sequence approach without loop entropy defined above will be an accurate 
approximation to the exact result for the extended model y>™* given in (47 1 (N + 2 base-pairs with closed boundary 
conditions). When this approximation is valid, multi-bubble states are extremely rare [the range of validity in N of 
the one-sequence approximation depends on the value of Jo via a hi (50)]. 

Although it is difficult to incorporate bubble loop entropy into our model in a general way because of mathematical 
complications arising from the "long-range" nature of the loop entropy factor, it is easy to do so within the one- 
sequence approximation. Including the loop entropy lowers the probability of n-bubble opening. We adopt a common 
simplified form for the loop entropy factor associated with n broken base pairs j43j [68j ES] , 



Sle(ti) = (n + 2 + 2n)~ 



(56) 



that depends on the bubble loop length, £-q = 2 + 2n, and is parametrized by a constant no and an exponent k. 
The loop entropy exponent k is thought to be in the range 3/2 < k < 2.1, depending on the extent to which chain 
self-avoidance is taken into account [7]. The term no accounts for the enhanced difficulty of forming small closed 
bubbles arising from DNA chain stiffness. Including the loop entropy leads to a modified one-sequence partition 
function, given by 



N-l 



z ueq = 1 + E ( TO + : )5le(^V - to) exp 



(57) 
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The introduction of loop entropy (k > 0) in Zf s ^ can have an exaggerated effect on the calculated melting curves 
if the loop initiation factor, oli (50 1, is not readjusted at the same time. If we define D — (uq + 2)/2 and use 
Jo + (fc/4) ln(2D) in (57 1 then Zf^ can be rewritten as 



N-l 



7 cl,LE 
J lseq 



1+ J^(m+ 1)[1 + (N- m)/D]- k exp 



m=0 



(58) 



with Gi n t still given by ( |48| (in the fitting of experimental data, the value of D has been taken to be as large as 96 [1 
and even 450 [68 ). The above readjustment of Jo means that only long n-bubbles (n = N — m > D) "feel" the effect 
of loop entropy (the suppression of short bubble formation due to increased chain stiffness being incorporated directly 
into the readjusted Jo)- We will compare the predictions of the one-sequence approximation with (fc, D > 0) and 
without (k = 0) loop entropy using (58 1. Although the sums in (58) apparently cannot be carried out analytically, 



once they are performed numerically, the analog of (51 1 can be used to obtain (f B \ 
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FIG. 6: Comparison of the two-state approximation (dashed curves) without loop entropy (60 1 with the full result ( |47[ ) (solid 
curves) for closed boundary conditions; from left to right N = 136, 105, 83, and 67 (same parameters and colors as Figure j3j). 



If in evaluating the one-sequence partition function, Z^^, we retain only the completely closed (m = N) and 
completely open (to = 0) states, we obtain the two-state approximation: 



cl.LE 
V B .2st 



1 



{(1 + N/D) 



— k 



exp 



-/SAG!"' 



(59) 



A more general s-state approximation can be defined by including the to = 0, . 
loop entropy (k = 0) (59) simplifies to 



, s — 2 terms in the sum ( 57 1 . Without 



fB,2st — 



1 — tanh 



-0AG™/2 



(60) 



In Fig ure [6] the 2-state approximation without loop entropy is compared to the exact result for the extended 
case ( 47 1 . We observe a cross-over temperature (at which the 2-state approximation begins to overestimate ip c g ) roughly 



given by the temperature at which AG[^ goes from positive to negative (signaling a vanishing "nucleation barrier" 
for the completely open insert). Contrary to previous claims pQ, in the present case the two-state approximation 
overestimates T m (N) by more than 2 K and underestimates the transition width. 
The form ( 58 1 suggests defining an effective total n-bubblc free energy 



ln(7V- 



1) +k\n(l + n/D) 



(61) 



that accounts for the intrinsic free energy of bubble formation (first term), as well as positional (second term) and 
loop entropy (third term). ftAG^ decreases with increasing n for T > [Lq < 0) and increases for T < 
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FIG. 7: Free energy of bubble formation for N = 136 and T = 339,342,345,347 K (from top to bottom): (a) intrinsic free 
energy, (3AG^; total free energy, f3AF£$: (b) k = (without loop entropy); (c) k = 1.7, D = 100; (d) , k = 1.7, D = 1 
(/ = 0, other model parameters as in Fig. pi. 



(Lq > 0). The positional and loop entropy contributions increase the effective free energy cost of bubble creation as 
the bubble size n increases. 

In Fig. [7] we plot bubble free energies for N = 136 and increasingly important loop entropy effects. The intrinsic 
part, AG;™ t is a linearly decreasing function of n and vanishes at T = 345 K, close to the temperature at which the 
2-state approximation becomes an overestimation (see Fig. [7j|. We observe that (i) inclusion of the positional entropy 
alone (Fig. [7|i>) leads to a minimum in (61 1 near n = N for sufficiently high temperatures and (ii) the loop entropy 
rigidity parameter D plays a minor role when it is close to 100 (Fig. [7j:) and an important one when it is close to 1, 
the value commonly used in the modeling of infinite chains (Fig. [711). In the latter case (61 1 remains positive over the 



whole temperature range studied and has a maximum for small n and a minimum near n = N for sufficiently high 
temperatures. 

In Fig. H we compare the one-sequence approximations with and without loop entropy ( |60[ ) for short inserts obeying 
closed boundary conditions. For J = 9.13 kJ/mol we find that for inserts without loop entropy the one-sequence 
approximation is practically indistinguishable from the exact result (47 1 for N < 10000. Because loop entropy further 
reduces the probability of bubbles, we therefore believe that the one-sequence approximation with loop entropy should 
be an excellent approximation in most cases of practical interest (i.e., inserts with lengths less than a few thousand 
base-pairs). We observe in Fig. [8] that for such inserts and fixed L the net result of including the loop entropy is to 
shift the melting curves to the right by about 10 K for D = 1 and about 2 K for D = 100 without much change 
in the transition width. It therefore seems as if the addition loop entropy will not enable us to improve the fits to 
experiment shown in Fig. [5] 

Although it is possible to work out the details of the one-sequence approximation when the end base-pairs in an 
insert of length N experience a chemical potential p! < oo, we will not present these results here. 



2. One-sequence approximation for free boundary conditions 

We now examine the one-sequence approximation with and without loop entropy for DNA homopolymers of length 
N with free boundary conditions. Because most synthetic DNA homopolymers are less than a few thousand base- 
pairs long [2 [42, 53, 64], the one-sequence approximation may be a useful and accurate simplified approach in such 
cases. For free boundary conditions, besides single interior bubbles, we must include the possibility of single helical 



sequences. The effective free energy of creating an interior n-bubble with two base-pair domain walls is given in (48 1 



the effective free energy of creating a single unzipped sequence of length n starting at i = 1 or i = N (with only one 
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base-pair domain wall) is |18j : 



0AG™ = 2 J - K + 2nL 



(62) 



The effective free energy for creating a single interior helical sequence of length m — N — m (including neither the 
i = 1 or i = N base-pair) with two domain walls is |18j : 



(3AG 



M 

helix 



4 J - 2K + 2(N - m)L 



(63) 



The effective free energy needed to completely denature the DNA chain of length N is /3AGo pC n = 2LqN — 2Kq. 
The restricted one-sequence partition function for free boundary conditions, Z^^ q , includes contributions from (i) the 
completely closed state (dsDNA), normalized to a weight of one, (ii) interior n-bubbles inserted in a domain of length 
N — 2 varying in size between n — 1 and N — 2, 



N-3 



Z?£ q = ^(m+l)exp 



m=0 



-f3AG^~ 2 - m) 



(64) 



(iii) one unzipped end sequence of length n, Zflf with two-fold degeneracy 



N-l 



end 



2 exp 



(65) 
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FIG. 9: The four DNA states accounted for in the one-sequence approximation for free polymers (aside from the dissociated 
chains): (a) closed chain, (b) end unwinding (c) internal helix (d) internal bubble, corresponding, respectively to the first four 
terms in 1681 



(iv) a single interior Helical sequence 



JV-2 



J lseq 



^2(N - 1 -m) e exp 



-pA.G- helix 



(66) 



where e — 1 without chain sliding (for heteropolymers using average parameter values) and 2 with (for homopolymers 
like polydA-polydT) [T2l 1551 [6"1] , (iv) the completely open state (op) , 



Ziseq can therefore be written as 



yfrcc 



Z Ueq = eX P 



pen 



7Hi nt 

J lseq 



z 



Bjnt 

lseq 



lseq 



(67) 



(68) 



The four DNA states accounted for in the one-sequence approximation (aside from the dissociated chains) are shown 
in Fig. [9] 



It is now easy to include loop entropy by inserting the loop entropy factor ^le into the second term of ( 68 1 : 
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m=0 



It is not possible now to simply readjust Jo as was done for inserts, because unzipped end sequences "see" the 
un-readjusted Jo. Unzipped end sequences are composed of two unbound chains joined at one end and therefore there 
is no loop entropy factor in Zf^ q or Z^" q (a small correction term for two such self-avoiding chains, however, has 
been neglected, see [71]). We can, however, rewrite (69 1 as 
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/3AG£t is the 

same as /3AG>™ t with J replaced by 

J = J + (fc/4) ln(2L>) > J. 

It is then possible to define an effective loop initiation factor, ctli = e 
bubble formation in the presence of loop entropy and depends on the readjusted value J (although it is still ctli that 
controls the probability of end unwinding and one internal helical section) . 

Within the free boundary condition one-sequence approximation the average fraction of broken base-pairs can be 
obtained from Z°^ eq via 



(71) 

< o"li, that controls the probability of 
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When chain dissociation is taken into account the contribution from the completely open chain, Z°^ eql is dropped 
from Zige^j which then becomes the internal partition function for associated chains: 
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The corresponding f%°i seq is the fraction of broken base-pairs in associated chains (clearly a lower bound for the ex- 
perimentally measured total fraction of broken base-pairs, because the contribution of dissociated chains is neglected). 
In this case the one-sequence approximation ( |73| incorporates the first four terms (of order 0, 1/2 and 1 for the last 
two terms) in an expansion in powers of the loop initiation factor, ctli [the so-called zipper model neglects the last 
(bubble) contribution] [42j|64]. The next higher term, neglected in ( [68] ) and of order 3/2, accounts for one internal 
bubble with chain sliding. In most cases of practical interest there is little difference between using ( 68 1 and (73 1. 
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FIG. 10: Melting curves: comparison of the exact result with the one-sequence approximation (including completely open state) 
for a free chain (no loop entropy, no sliding). Exact results (solid curves) from right to left near the upper part of the curves 
(T > 339 K), N — 500 , 2000, 10000; one-sequence approximation, N = 500 (long dashed curve), 2000 (intermediate dashed 
curve), 10000 (short dashed curve), / = 0.5 and other parameters as in Figure [3}. (a) J = 4.57 kJ/mol; (b) as in (a) but now 
Linear-Log plot; (c) J = 9.13 kJ/mol; (d) as in (c) but now Linear-Log plot. In (c) and (d), the dashed and solid curves are 
superposed. 



The above one-sequence approximation should be valid for sufficiently short chains. After determining its range of 
validity when loop entropy is neglected, we can then use it with confidence within this range to examine the influence 
of loop entropy on DNA denaturation. In Fig. [I0]we test the validity of the one-sequence approximation with neither 
loop entropy, nor chain sliding by comparing it with exact result ( 40 1 for which the partition function includes the 
completely open state. From now on we fix the weighting factor / at 0.5, which, as explained earlier, is close to 
the one estimated from experiment. We observe that the one sequence approximation is accurate when N < 500 for 
J = 4.57 kJ/mol (Fig. [9^,) and accurate beyond N < 10000 for J — 9.13 kJ/mol (Fig. [9]d); in both cases studied 
the melting temperature is well reproduced, although the transition width is underestimated for J = 4.57 kJ/mol 
when N < 500 (with the discrepancy increasing with increasing N). The one-sequence approximation also somewhat 
overestimates the temperature T* at which the melting curves intersect. We conclude that the limiting value of N for 
which the one-sequence approximation is accurate depends critically on the value of Jo via the loop entropy factor(50 1. 

Because we are now interested in studying the effects of loop entropy on thermal denaturation, we employ the 
smaller value for J (4.57 kJ/mol). Despite this smaller value, the inclusion of loop entropy reinforces the validit y of 
the one-sequence approximation. For J = 4.57 kJ/mol, k = 1.7, and D = 100 (no = 198), the readjusted value J (71 1 
is greater than 9.13 kJ/mol, implying that in this case bubbles are even more highly suppressed for J = 4.57 kJ/mol 
with loop entropy than for J — 9.13 kJ/mol without loop entropy. In Fig. [II] we observe that at low temperature 
the chain-sliding-only model gives the highest melting and the loop-entropy one the lowest. At higher temperature 
the sliding-loop entropy model gives the highest melting. For the case considered in Fig. [TT] we therefore expect the 



accuracy of the one-sequence approximation to be comparable to that seen in Fig. 10:,d (and not Fig. 10i,b) 



In Fig. 12 we plot the melting curves using the Loop Entropy-Sliding model for free chains of three different lengths 
(N = 500, 200, 10000) and compare the results obtained without loop entropy and sliding. We note that due to the 
combined effects of sliding and loop entropy the melting temperature increases with increasing N and the width of the 
transition decreases (Fig. 12 d), in agreement with experiment [T] (for / = 0.5 the temperature T* at which the melting 
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FIG. 11: Internal melting curves (associated chains): Comparison of various one-sequence approximations for a free chain with 
N = 2000: (a) Linear plot (b) Linear-Log plot; neither loop entropy nor sliding (long dashed curve), sliding only (intermediate 
dashed curve), loop entropy only (short dashed curve), both loop entropy and sliding (solid curve), (A: = 1.7, no = 198, / = 0.5, 
other parameters as in Figure |3j. 



curves intersect is now greater than T m (N), the opposite of what occurs when loop entropy and sliding arc neglected, 



see Fig. 12 1). The model prediction for the difference between the melting temperatures for N — 500 and 10000 is 
about 0.5 K (the results for N > 10000 should be very close to the N = 10000 one). When chain dissociation is added 
to the model, one can reasonably expect that the melting temperature for N — 500 will decrease by about 0.5 K 
[121 [53] and that for N > 10000 will hardly change. This result suggests that once chain dissociation is incorporated 
into the current model, it should be possible to account for the experimental results of [T] [T m (30000) — T m (500) ~ 1 K 
and decreasing transition width as N increases]. 



IV. CONCLUDING REMARKS 



This paper presents the extension of a theoretical model of DNA denaturation jTTl [TB] that couples the base-pair 
states, unbroken or broken, and the chain configurational degrees of freedom. The elastic contributions are taken into 
account, arising from chain bending, torsional and stretching rigidities, the values of which depend on the neighboring 
base-pair states. The difference of bond lengths in ssDNA (0.34 nm) and dsDNA (0.71 nm) is also included in the 
Hamiltonian. This model, tackled by analytical means, provides new insight into the dependence of the effective Ising 
parameters, used in previous Ising-like models, on microscopic elastic moduli. The main conclusion is that all these 
features lead to a renormalization of the bare Ising parameters on the order of magnitude of the thermal energy. 
Hence, they cannot be ignored when relating microscopic properties, extracted for example from ab initio calculations 
or experiments on DNA fragments, to the collective properties of the whole chain measured, for instance, in single 
DNA molecule experiments (atomic force microscopy, optical and magnetic tweezers, tethered particle motion). As 
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FIG. 12: Internal melting curves (associated chains) obtained using the Loop Entropy- Sliding model for free chains of three 
different lengths: N = 500 (long dashed curve); 2000 (solid curve); 10000 (short dashed curve) with J = 4.57 kj/mol, k — 1.7, 
no = 198, / = 0.5 (other parameters as in Figure J3J: (a) with neither loop entropy, nor chain sliding; (b) with loop entropy 
and chain sliding. 



an illustration, without considering the effects of stretching elasticity and base-pair length, the energy cost to open 
a base-pair, 2^, would be directly related to the same quantity measured with a force apparatus |67l 172] . But /u, 
is renormalized by these effects and is lowered by 0.5 to 1 fcsT when the bare value is close to 2 ksT. The same 
conclusion holds for the destacking, J, or stacking, K, parameters. 

In this work, we also analyze finite size effects. In particular the role of closed boundary conditions on melting 
curves for finite lengths is investigated in order to model a clamped polydA-polydT DNA inserts. Two approximations 
are considered: (i) the one-sequence approximation amounts to neglecting configurations with several bubbles and 
(ii) the two-state one keeps only the contributions from the completely closed and open chains pQ. In the range of 
parameters studied, the agreement with the exact result is excellent in case (i), whereas it is much less satisfactory 
in case (ii). We also undertake the integration of loop entropy in case (i), which leads to an increase in T m that is 
associated with the loop entropy cost and depends on the value of the loop entropy chain stiffness parameter D (for 
N ~ 100 there is a shift of 1 K for D = 100 and of 5 K for D = 1). Finally, we study free polymers chains using 
exact results with neither loop entropy nor chain sliding and the one-sequence approximation with loop entropy and 
chain sliding. Our major conclusion is that the experimentally observed increase in T m with increasing chain length 
for homopolymers can be accounted for by incorporating both loop entropy and chain sliding into our model. The 
simplicity of our method of incorporating loop entropy into the one-sequence approximation paves the way to a deeper 
study of the role of chain stiffness in the loop entropy factor, <?le- We underline that careful experiments on free and 
clamped homopolymers of different lengths (in solution or in single molecule experiments) would be extremely useful 
in elucidating the role of DNA finite size effects. 

From an experimental perspective, our findings are relevant for free DNA in dilute solutions, without any constraint 
on chain configurations, nor any applied force or torque. An ingredient that we did not consider so far is the gain in 
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translational entropy due to strand separation in the case of dissociation [69 . A correct treatment of this mechanism 
consists in writing a chemical equilibrium between completely denatured single strands and partially bound ones 
(work in progress). 

The case of constrained DNA is more involved. If a force or a torque is applied, for instance in tweezer experiments, 
rotational symmetry is lost in the Hamiltonian, which prevents an analytical solution of the problem. Numerical or 
approximate schemes, such as variational principles, may be used. Another interesting constraint concerns polymer 
looping [S3]. Circular DNAs appear in the case of transposons or insertion sequences pffl HO] , Writing down the 
polymer closure (e.g., for the determination of the J-factor) is a formidable task because it corresponds to the global 
constraint J^t, = 0, formally equivalent to an applied force [12]. We can, however, partially take into account looping 
in our framework by imposing periodic boundary conditions on the vectors e M .; and/or on <7j, instead of the end 
condition \V). This can be handled using the transfer matrix method. In the case of superhelical twist, the polymer 
winds one or several times around its tangent vectors t^. This condition can also be enforced via the boundary 
conditions, by requiring that the appropriate combination of Euler angles acquires a phase multiple of 2-7T when going 
from i — N to i — 1. This topological constraint should lead to an increased fraction of denaturated base-pairs, in 
order to release the torsional energy cost, and consequently to an increased flexibility, thereby facilitating cyclization. 
Our predictions for the end-to-end distance can also be compared to experiment, because R is proportional to the 
radius of gyration, which can be measured in viscosity experiments. 

All the results presented in this paper concern homopolynucleotides and the numerical applications focused on 
PolydA-dT. This work can, however, be generalized to heteropolymcrs, although a minimal amount of numerical 
work is necessary to handle the reduction of the transfer matrices. Nonetheless, a numerical study of heteropolymers 
would require the knowledge of the microscopic elastic moduli, which are far from being known with any certainty for 
any pair of the four nucleotides A, T, G and C. 



Appendix 

In this Appendix we extract smooth melting curves from the experimental data in [T]. For the poly dA-dT DNA 
polymer with free ends and 30000 base-pairs we have used the temperature derivative of 



2 



sinh(— af + Puf) 



Je-iPJf + sinh 2 (- a/ + fin/) 



(74) 



where c/, aj, and fif are fitting parameters (simplified N — oo Ising form); this functional form arises in simple Ising 
models of DNA denaturation [53] . 
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FIG. 13: Absorbance temperature derivative (un-normalized difis/dT) vs. temperature: experimental data points [T] and 
un-normalized fitted functions (green curves, left y-axis); UV absorbance (un-normalized fraction of broken base-pairs, ipb), 
vs. temperature (red curves, right y-axis) (from left to right: N = 30000, 136, 105, 83, and 67). 



For A-T inserts we have used the temperature derivative of 



Vet 



c f 
2h f 



{1-tsnblhfiTf-T)]}, 



(75) 
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where c/, hf, and Tf are fitting parameters; this functional form arises in a two-state treatment of simple Ising 
models of DNA denaturation [42] (the use of the two-state form here to extract smooth experimental melting curves 
does not imply that the two-state approximation is a valid one, see Fig. |6j. As shown in Fig. 13 the areas under 
the fitted dtps/dT functions are not normalized to one. We thus assume that the normalized fitted ips functions 
(Fig. " 



Fig. 



14 1 represent a good approximation to the fraction of open base-pairs for the A-T segments. By examining 

but less so for the N — 30000 base-pair 



A.l we see that this assumption is well borne out for the A-T inserts, 
chain because of difficulties in reading the data off the experimental curve and the asymmetry of this curve. Our 
choice of fitting functions give symmetric curves about the melting temperature and thus cannot accounted for the 
observed asymmetry for N = 30000. The observed asymmetry probably cannot be explained by loop entropy and 
chain sliding (for infinite chains at least) because when they are included in the model, the melting curves becomes 
flatter to the left of the melting temperature and steeper to the right, the opposite of what is observed in Fig. [13] (for 
finite chains, however, the combined effects of loop entropy and chain sliding can be different, see Fig. 11 1. Although 
the N = 30000 base-pair chain melting temperature ~ 339 K is well reproduced, the width of the transition appears 
to be overestimated. The general trend is for both the melting temperature and transition width to decrease with 
increasing N. As the length of the insert increases the melting should tend to the infinite free chain result. 
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FIG. 14: Normalized functions fitted to the experimental data lj: (a) fraction of broken base-pairs vs. temperature, ips; (b) 
d(p B /dT vs. temperature (from left to right, N = 30000, 136, 105, 83, and 67). 
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